424 research outputs found
Large-scale log analysis of digital reading
In this paper, we address daily reading practices of the general public in Russia analyzing 10 months of log data from the commercial ebook site Bookmate. We study different reading characteristics with ebooks, i.e. the reading volume and preferences, reading schedule, reading speed and reading style (including parallel reading patterns and book abandonment rates), with respect to reader gender, book length and genre of the book. We find that book genres impact certain reading behaviors, while gender differences or book length seem to play less of a role in ebook reading. Parallel book reading and book abandonment occur very frequently, possibly pointing towards changing reading behaviors in the ebook environment. The obtained insights demonstrate the high potential of log analysis for book reading studies. Copyright © 2016 by Association for Information Science and Technolog
Protein Models: The Grand Challenge of protein docking
Characterization of life processes at the molecular level requires structural details of protein–protein interactions (PPIs). The number of experimentally determined protein structures accounts only for a fraction of known proteins. This gap has to be bridged by modeling, typically using experimentally determined structures as templates to model related proteins. The fraction of experimentally determined PPI structures is even smaller than that for the individual proteins, due to a larger number of interactions than the number of individual proteins, and a greater difficulty of crystallizing protein–protein complexes. The approaches to structural modeling of PPI (docking) often have to rely on modeled structures of the interactors, especially in the case of large PPI networks. Structures of modeled proteins are typically less accurate than the ones determined by X-ray crystallography or nuclear magnetic resonance. Thus the utility of approaches to dock these structures should be assessed by thorough benchmarking, specifically designed for protein models. To be credible, such benchmarking has to be based on carefully curated sets of structures with levels of distortion typical for modeled proteins. This article presents such a suite of models built for the benchmark set of the X-ray structures from the Dockground resource (http://dockground.bioinformatics.ku.edu) by a combination of homology modeling and Nudged Elastic Band method. For each monomer, six models were generated with predefined Cα root mean square deviation from the native structure (1, 2, . . ., 6 Å). The sets and the accompanying data provide a comprehensive resource for the development of docking methodology for modeled proteins
Structural templates for comparative protein docking
Structural characterization of protein-protein interactions is important for understanding life processes. Because of the inherent limitations of experimental techniques, such characterization requires computational approaches. Along with the traditional protein-protein docking (free search for a match between two proteins), comparative (template-based) modeling of protein-protein complexes has been gaining popularity. Its development puts an emphasis on full and partial structural similarity between the target protein monomers and the protein-protein complexes previously determined by experimental techniques (templates). The template-based docking relies on the quality and diversity of the template set. We present a carefully curated, non-redundant library of templates containing 4,950 full structures of binary complexes and 5,936 protein-protein interfaces extracted from the full structures at 12Å distance cut-off. Redundancy in the libraries was removed by clustering the PDB structures based on structural similarity. The value of the clustering threshold was determined from the analysis of the clusters and the docking performance on a benchmark set. High structural quality of the interfaces in the template and validation sets was achieved by automated procedures and manual curation. The library is included in the Dockground resource for molecular recognition studies at http://dockground.bioinformatics.ku.edu
Protein Model Docking Benchmark 2
Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu
Simulated unbound structures for benchmarking of protein docking in the dockground resource
Background
Proteins play an important role in biological processes in living organisms. Many protein functions are based on interaction with other proteins. The structural information is important for adequate description of these interactions. Sets of protein structures determined in both bound and unbound states are essential for benchmarking of the docking procedures. However, the number of such proteins in PDB is relatively small. A radical expansion of such sets is possible if the unbound structures are computationally simulated.
Results
The dockground public resource provides data to improve our understanding of protein–protein interactions and to assist in the development of better tools for structural modeling of protein complexes, such as docking algorithms and scoring functions. A large set of simulated unbound protein structures was generated from the bound structures. The modeling protocol was based on 1 ns Langevin dynamics simulation. The simulated structures were validated on the ensemble of experimentally determined unbound and bound structures. The set is intended for large scale benchmarking of docking algorithms and scoring functions.
Conclusions
A radical expansion of the unbound protein docking benchmark set was achieved by simulating the unbound structures. The simulated unbound structures were selected according to criteria from systematic comparison of experimentally determined bound and unbound structures. The set is publicly available at http://dockground.compbio.ku.edu
Mass Spectrometry Based Molecular 3D-Cartography of Plant Metabolites
Plants play an essential part in global carbon fixing through photosynthesis and are the primary food and energy source for humans. Understanding them thoroughly is therefore of highest interest for humanity. Advances in DNA and RNA sequencing and in protein and metabolite analysis allow the systematic description of plant composition at the molecular level. With imaging mass spectrometry, we can now add a spatial level, typically in the micrometer-to-centimeter range, to their compositions, essential for a detailed molecular understanding. Here we present an LC-MS based approach for 3D plant imaging, which is scalable and allows the analysis of entire plants. We applied this approach in a case study to pepper and tomato plants. Together with MS/MS spectra library matching and spectral networking, this non-targeted workflow provides the highest sensitivity and selectivity for the molecular annotations and imaging of plants, laying the foundation for studies of plant metabolism and plant-environment interactions
Quantum algorithm and circuit design solving the Poisson equation
The Poisson equation occurs in many areas of science and engineering. Here we
focus on its numerical solution for an equation in d dimensions. In particular
we present a quantum algorithm and a scalable quantum circuit design which
approximates the solution of the Poisson equation on a grid with error
\varepsilon. We assume we are given a supersposition of function evaluations of
the right hand side of the Poisson equation. The algorithm produces a quantum
state encoding the solution. The number of quantum operations and the number of
qubits used by the circuit is almost linear in d and polylog in
\varepsilon^{-1}. We present quantum circuit modules together with performance
guarantees which can be also used for other problems.Comment: 30 pages, 9 figures. This is the revised version for publication in
New Journal of Physic
A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries
Interactive query expansion can assist users during their query formulation
process. We conducted a user study with over 4,000 unique visitors and four
different design approaches for a search term suggestion service. As a basis
for our evaluation we have implemented services which use three different
vocabularies: (1) user search terms, (2) terms from a terminology service and
(3) thesaurus terms. Additionally, we have created a new combined service which
utilizes thesaurus term and terms from a domain-specific search term
re-commender. Our results show that the thesaurus-based method clearly is used
more often compared to the other single-method implementations. We interpret
this as a strong indicator that term suggestion mechanisms should be
domain-specific to be close to the user terminology. Our novel combined
approach which interconnects a thesaurus service with additional statistical
relations out-performed all other implementations. All our observations show
that domain-specific vocabulary can support the user in finding alternative
concepts and formulating queries.Comment: To be published in Proceedings of Theories and Practice in Digital
Libraries (TPDL), 201
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
- …